60 research outputs found

    Equilibrium: Optimization of Ceph Cluster Storage by Size-Aware Shard Balancing

    Full text link
    Worldwide, storage demands and costs are increasing. As a consequence of fault tolerance, storage device heterogenity, and data center specific constraints, optimal storage capacity utilization cannot be achieved with the integrated balancing algorithm of the distributed storage cluster system Ceph. This work presents Equilibrium, a device utilization size-aware shard balancing algorithm. With extensive experiments we demonstrate that our proposed algorithm balances near optimally on real-world clusters with strong available storage capacity improvements while reducing the amount of needed data movement.Comment: source code: https://github.com/TheJJ/ceph-balance

    Tiny Classifier Circuits: Evolving Accelerators for Tabular Data

    Full text link
    A typical machine learning (ML) development cycle for edge computing is to maximise the performance during model training and then minimise the memory/area footprint of the trained model for deployment on edge devices targeting CPUs, GPUs, microcontrollers, or custom hardware accelerators. This paper proposes a methodology for automatically generating predictor circuits for classification of tabular data with comparable prediction performance to conventional ML techniques while using substantially fewer hardware resources and power. The proposed methodology uses an evolutionary algorithm to search over the space of logic gates and automatically generates a classifier circuit with maximised training prediction accuracy. Classifier circuits are so tiny (i.e., consisting of no more than 300 logic gates) that they are called "Tiny Classifier" circuits, and can efficiently be implemented in ASIC or on an FPGA. We empirically evaluate the automatic Tiny Classifier circuit generation methodology or "Auto Tiny Classifiers" on a wide range of tabular datasets, and compare it against conventional ML techniques such as Amazon's AutoGluon, Google's TabNet and a neural search over Multi-Layer Perceptrons. Despite Tiny Classifiers being constrained to a few hundred logic gates, we observe no statistically significant difference in prediction performance in comparison to the best-performing ML baseline. When synthesised as a Silicon chip, Tiny Classifiers use 8-18x less area and 4-8x less power. When implemented as an ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x less area and consume 13-75x less power compared to the most hardware-efficient ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.Comment: 14 pages, 16 figure

    Performing Human Rights in Neoliberal Asia: Artistic and Activist Engagements in Hong Kong, Malaysia, and Singapore

    Full text link
    This dissertation examines artistic and activist performances that address issues of rights abuses in Hong Kong, Malaysia, and Singapore. I demonstrate how the centralized ruling parties encourage the neoliberalization of their economies while maintaining autocratic rule, thus intensifying structural inequalities while also clamping down on dissent. This condition then exacerbates the lack of labor, sexuality, and democratic rights. Concurrently, the states’ aspirations to be part of the global capitalist market have paradoxically provided conditional spaces of political and artistic expression. I contend that existing critiques of human rights from sociological and legalistic perspectives are inadequate for contemplating this state of affairs. My intervention thus lays in examining how the lens of performance studies reveals the fraught significance of rights claims in the region. My case studies show how authoritarian neoliberalism has created peculiar scenarios where queer subjects are legally criminalized yet desired as economic generators, resulting in the proliferation of queer theatre and businesses; where low-waged migrant workers are exploited even while the state and the market fund theatre initiatives addressing the issue; and where aspirational practices of democracy are seen in the structures of artistic rather than electoral processes. By deciphering the dramaturgical strategies of works of theatre, installation art, photography as well as participatory street protests and demonstrations, I argue how by means of their embodiment, artistic and activist practices not only viscerally confront the urgency of addressing injustice, they also manifest the particularities of the contexts in which they occur. In conclusion, I posit that a performative framework of human rights moves the judgment of its efficacy past that of legislative possibilities to how it enables nuanced agential shifts in the participants’ political subjectivities. As such, I see how the artists and activists in the quest for rights claims are constantly trying to strike a balance between resisting and being co-opted by authoritarian states in neoliberal Asia

    Efficient Communication Scheduling with Re-routing based on Collision Graphs

    No full text
    Parallel systems are increasingly being used in applications requiring high throughput or which have real-time deadlines because of their potential for computation time savings. However, this savings is often offset by the communication overhead inherent in such systems. In this paper, such a communication overhead was encountered while performing simulations of partial differential equations (representing fluid dynamics problems) by using the multi-dimensional wave filters method. With tightly-coupled architectures as the platform, the static communication scheduling of messages in the network is addressed. The compile time determination of when nodes should send their messages to other nodes in the network is what is termed static communication scheduling. Additionally, the routing of these messages is also addressed. Although the static scheduling of computational tasks has been studied for some time, our problem is very new. This paper utilizes the newly developed Collision Graph ..

    Hardware/Software Co-design With the HMS Framework

    No full text
    Hardware/Software co-design is an increasingly common design style for integrated circuits. It allows the majority of a system to designed quickly with standardized parts, while special purpose hardware is used for the time critical portions of the system. The framework considered in this paper performs Hardware/Multi-Software (HMS) co-design for iterative loops, given an input specification that includes the system to be built, the number of available processors, the total chip area, and the required response time. Originally, all operations are done in software. The system then substitutes hardware (adder, multiplier, bus) for software based on the needability of each type of hardware unit. After a new hardware unit is introduced the system is rescheduled using a variation of rotation scheduling in which operations may be moved between processors. Experimental results are shown that illustrate the efficiency of the algorithms as well as the savings achieved. i 1 Hardware/Software C..

    Bus Minimization and Scheduling of Multi-Chip Systems

    No full text
    This paper considers several different algorithms that reduce the required number of buses for multichip module design. An efficient polynomial time algorithm that calculates the minimum number of buses needed given a particular schedule is presented. We also present three algorithms that minimize the number of buses during scheduling. Experimental results are shown that illustrate the efficiency of the algorithms. 1 Introduction The design of computer systems consisting of several chips is referred to as multi-chip module (MCM) design. Such systems are becoming increasingly important for several reasons. First, the complexity and functionality of computer systems being built is increasing at a dramatic rate. This makes it very difficult for many systems to be built in a single chip even with the most advanced computer-aided design tools. Second, multi-chip modules permit designs to be modular. Hence, design time can be dramatically reduced. Finally, multiple chips increase the test..
    • …
    corecore